The urllib.request module provides an API for using Internet resources identified by URLs. It is designed to be extended by individual applications to support new protocols or add variations to existing protocols (such as handling HTTP basic authentication).
In [1]:
from urllib import request
response = request.urlopen('http://localhost:8080/')
print('RESPONSE:', response)
print('URL :', response.geturl())
headers = response.info()
print('DATE :', headers['date'])
print('HEADERS :')
print('---------')
print(headers)
data = response.read().decode('utf-8')
print('LENGTH :', len(data))
print('DATA :')
print('---------')
print(data)
The file-like object returned by urlopen() is iterable
In [2]:
from urllib import request
response = request.urlopen('http://localhost:8080/')
for line in response:
print(line.decode('utf-8').rstrip())
In [3]:
from urllib import parse
from urllib import request
query_args = {'q': 'query string', 'foo': 'bar'}
encoded_args = parse.urlencode(query_args)
print('Encoded:', encoded_args)
url = 'http://localhost:8080/?' + encoded_args
print(request.urlopen(url).read().decode('utf-8'))
To send form-encoded data to the remote server using POST instead GET, pass the encoded query arguments as data to urlopen().
In [1]:
from urllib import parse
from urllib import request
query_args = {'q': 'query string', 'foo': 'bar'}
encoded_args = parse.urlencode(query_args).encode('utf-8')
url = 'http://localhost:8080/'
print(request.urlopen(url, encoded_args).read().decode('utf-8'))
urlopen() is a convenience function that hides some of the details of how the request is made and handled. More precise control is possible by using a Request instance directly. For example, custom headers can be added to the outgoing request to control the format of data returned, specify the version of a document cached locally, and tell the remote server the name of the software client communicating with it.
As the output from the earlier examples shows, the default User-agent header value is made up of the constant Python-urllib, followed by the Python interpreter version. When creating an application that will access web resources owned by someone else, it is courteous to include real user agent information in the requests, so they can identify the source of the hits more easily. Using a custom agent also allows them to control crawlers using a robots.txt file (see the http.robotparser module).
In [3]:
from urllib import request
r = request.Request('http://localhost:8080/')
r.add_header(
'User-agent',
'PyMOTW (https://pymotw.com/)',
)
response = request.urlopen(r)
data = response.read().decode('utf-8')
print(data)
In [4]:
from urllib import parse
from urllib import request
query_args = {'q': 'query string', 'foo': 'bar'}
r = request.Request(
url='http://localhost:8080/',
data=parse.urlencode(query_args).encode('utf-8'),
)
print('Request method :', r.get_method())
r.add_header(
'User-agent',
'PyMOTW (https://pymotw.com/)',
)
print()
print('OUTGOING DATA:')
print(r.data)
print()
print('SERVER RESPONSE:')
print(request.urlopen(r).read().decode('utf-8'))
In [ ]: